Automated Transport Mode Detection of GPS Tracking Data

Author

Cyril Geistlich, Micha Franz

Abstract

This project aims to investigate key factors and features in GPS tracking data to differentiate transportation vehicles. Machine learning is applied to automate transportation mode detection using spatial, temporal, and attribute analysis. Manual verification of results ensures accuracy. The findings contribute to computational movement analysis and automated transportation mode detection.

1. Introduction

In recent years, the spread of GPS-enabled devices and progress in location-based technologies have generated vast amounts of GPS tracking data. This data holds significant potential for extracting insights and to improve our understanding of human mobility patterns. One main application in this field is the differentiation of transportation modes. This can benefit various domains such as traffic management or urban planning. Determining the mode of transportation from GPS tracking data presents several challenges. With the ubiquitous increase of GPS tracking through smartphones and other technical devices, it’s too time consuming and expensive to manually annotate data and also prone to human error or biases. This leads to the following two research questions:

What are the key factors and features that can be extracted from GPS tracking data to differentiate between different types of transportation modes?

How can machine learning techniques be applied to GPS tracking data to automate the detection of the mode of transportation and which accuracies can be achieved by different machine learning algorithms?

The project will focus on exploring spatial and temporal aspects to extract key factors from GPS tracking data, such as velocity, sinuosity or angles. Additionally, spatial context in the form of traffic networks and land cover is added to the data in order to improve the accuracy of transportation mode detection. Machine learning algorithms will be tested and employed to automate the classification of transportation modes. An accurate algorithm is aimed to be found by training and evaluating different models on labeled data. These models include random forests, support vector machines or neural networks. To ensure the accuracy of the models, a subset of the classified data is used to validate the performance. By comparing the results of the automated classification with ground truth data, the project aims to assess the achieved accuracies of different machine learning algorithms and identify areas for improvement.

2. Data

The main data are the GPS tracking data, which were recorded through the Posmos App via smartphone throughout a time span of approximately 1.5 months by the two authors and from the available data pool. The complete collected data was manually labelled to ensure a valid ground truth. Further, spatial context data such as the Swiss road network, tram network,1 train network2 and the bus network of the cantons of Zurich3 and Bern.4 (Note: There is no official data set for the entire Swiss bus network according to the federal bureau of transport. Thus the available ones for Bern and Zurich were used, where a significant amount of data points pertaining to bus usage were collected). To facilitate the detection of the transportation mode boat, land cover data containing all Swiss waters was also used.5

Code
library("dplyr")
library("sf")
library("readr") 
library("ggplot2")
library("mapview")
library("lubridate")
library("zoo") 
library("caret")
library("LearnGeom") # to calculate angle
library("geosphere") # to calculate distances
library("RColorBrewer") # to create custom color palettes
library("ggcorrplot") # for correlation matrix
library("ROSE")
library("gridExtra")
Code
# creates lines out of points, used for visualisation purposes
point2line <- function(points){
  geometries <- st_cast(st_geometry(points %>% select(geometry)), "POINT")
  n <- length(geometries) - 1
  linestrings <- lapply(X = 1:n, FUN = function(x) {

  pair <- st_combine(c(geometries[x], geometries[x + 1]))
  line <- st_cast(pair, "LINESTRING")
  return(line)
  })
  
  multilinetring <- st_multilinestring(do.call("rbind", linestrings))
  
  df <- data.frame(linestrings[1])
  
  for (i in 2:length(linestrings)){
    temp <- data.frame(linestrings[i])
    df <-  rbind(df, temp) 
  }
  sf_lines <- df %>% st_as_sf()
}

un_col <- function(df){
  return(length(unique(df)))
}
Code
# read personal tracking data
posmo_micha_truth_csv <- read.delim("data/manually_labelled/posmo_20230502_to_20230613_m.csv",sep=",") 
posmo_cyril_truth_csv <- read.delim("data/manually_labelled/posmo_2023-05-01T00_00_00+02_00-2023-06-26T23_59_59+02_00.csv",sep=",") 
posmo_micha_csv <- read.delim("data/posmo_labelled/posmo_20230502_to_20230613_p.csv",sep=",") 


# read tracking data from pool
posmo_pool_1 <- read.delim("data/manually_labelled/posmo.csv",sep=",") %>% tail(612) # last 250 data points are not correctly labelled
posmo_pool_2 <- read.delim("data/manually_labelled/posmo_2.csv",sep=",") 
posmo_pool_3 <- read.delim("data/manually_labelled/posmo_BuJa.csv",sep=",") 
Code
# read context data
tram_netz <- read_sf("data/tlm_oev_eisenbahn/tlm_oev_eisenbahn_tram.shp") %>%
  select(id, geometry) %>% st_zm() %>% st_buffer(10) %>% st_union
zug_netz <- read_sf("data/tlm_oev_eisenbahn/tlm_oev_eisenbahn_bahn.shp") %>% 
  select(id, geometry) %>% st_zm() %>% st_buffer(10) %>% st_union
gewaesser <- read_sf("data/tlm_bb_gewaesser/tlm_bb_gewaesser_larger20000.shp") %>% 
  select(id, geometry) %>% st_zm() %>% st_union
strassen <- read_sf("data/tlm_strassen/strassen_AOI.shp") %>% 
  select(id, geometry) %>% st_zm() %>% st_buffer(10) %>% st_union

bus_netz_zvv <- read_sf("data/zvv_netz/Linien_des_offentlichen_Verkehrs_-OGD.gpkg", layer="ZVV_LINIEN_L")
bus_netz_zvv <- bus_netz_zvv[grepl("bus|Bus", bus_netz_zvv$BETRIEBSZWEIG_TXT), ] %>% rename(geometry = geom) %>%
  select(geometry) %>% st_buffer(10) %>% st_union()

bus_netz_bern <- read_sf("data/geoinformation_bern/OEVTP_LINIE.shp") %>% filter(VERKMITT_B == "Bus") %>% select(geometry) %>%st_zm() %>% st_buffer(10) %>% st_union()

bus_netz <- st_union(bus_netz_bern, bus_netz_zvv)
Code
process_posmo_data <- function(posmo_data) { # function with data cleaning steps

  # Convert to sf object
  posmo_data <- posmo_data %>%
    st_as_sf(coords = c("lon_x", "lat_y"), crs = 4326) %>%
    st_transform(crs = 2056)
  
  # Remove unwanted columns
  posmo_data <- posmo_data[, -c(1, 3, 4)]
  
  # Fix Timestamp
  posmo_data$datetime <- ymd_hms(posmo_data$datetime) + hours(2)
  
  # Add ID to rows
  posmo_data <- posmo_data %>%
    mutate(id = row_number())
  
  # remove duplicate time values
  posmo_data <- posmo_data[!duplicated(posmo_data$datetime), ]
  
  # remove subsequent duplicate location (person wasn't moving)
  posmo_data <- posmo_data %>% 
    filter(geometry != lead(geometry))
  
  return(posmo_data)
}

3. Methods

3.1 Preprocessing

When tracking a person throughout the day using GPS data, there are instances where the person appears to be stationary, such as when in an office or at a university. However, due to GPS inaccuracies, these stationary points may not appear at the exact same location and can exhibit erratic movement patterns. The accuracy of GPS signals is often compromised in dense buildings, amplifying this phenomenon. Figure XXX (screen noch machen) shows an example of this phenomenon around the Irchel campus of the University of Zurich. As a result, parameters like velocity and step length can show values that are typically associated with other categories. To address this issue, two approaches have been employed.

The first approach involves analyzing the angles between consecutive points. Typically, these angles are significantly smaller for stationary points compared to other movements. By visually determining a threshold angle the data set is filtered to remove all data points with angles smaller than 60°. This process needs to be repeated iteratively until no angles below the threshold remain, as the removal of data points alters the angles between the remaining points. Figure XXX (screen noch machen) shows several iterations, removing more and more points with an angle below the threshold. One problem of this approach is that in some cases small angles can also emerge naturally and not due to an error. These points are then falsely removed. There is a special case of this problem in the context of a U-turn or a sharp change in direction, the angle between the points just before and after the turn may indeed be small. Consequently, the removal of these points leads to a re-calculation of angles, which can result in the subsequent removal of additional points and the loss of significant segments. Figure XXX demonstrates this phenomenon. However, through visual inspection of a representative amount of data, this only occurs rarely.

The second approach considers the distance between the current point and a set number of preceding and consecutive points. A point is deemed static if the maximum distance between that point and any of the set number of preceding or consecutive points exceeds a predefined distance threshold. However, this approach may unintentionally remove non-static data points, particularly when a person is walking slowly and numerous data points are recorded within a small distance. Adjusting the distance threshold or the number of preceding and consecutive points can mitigate this issue, but it requires striking a balance between filtering out false movements and retaining genuine data. The sampling rate of Posmos was set to 10 or 15 seconds, but in some cases, data points were recorded every three seconds. Obviously, this enhances the the chances of removing data in the just described way. Since this behavior was not expected and only discovered late in the process, the point exhibiting an abnormally short sampling interval were not removed prior to preprocessing.

Finding the optimal compromise between these filtering approaches involves considering the specific characteristics of the tracked person’s movements and the quality of the GPS data. By iteratively applying the angle-based filtering and analyzing the distance to surrounding points, a more accurate identification of stationary periods can be achieved, mitigating the impact of GPS inaccuracies and preserving the integrity of the tracking data. Thus, the thresholds were set by trial and error.

Code
filterStaticByDistance <- function(data, threshold_distance, consecutive_points) {
  require(geosphere)
  
  # transform to WGS84, necessary to calculate distance using geosphere
  data <- data %>% st_transform(4326)
  
  # Extract coordinates from the geometry
  coords <- data.frame(st_coordinates(data))
  data$longitude <- coords$X
  data$latitude <- coords$Y
  
  # Calculate distances to preceding and consecutive points
  distances <- numeric(nrow(data))
  for (i in (consecutive_points + 1):(nrow(data) - consecutive_points)) {
    next_points <- coords[(i + 1):(i + consecutive_points), ]
    prev_points <- coords[(i - 1):(i - consecutive_points), ]
    all_points <- rbind(next_points, prev_points)
    distances[i] <- max(geosphere::distGeo(coords[i, ], all_points))
  }
  
  # Filter out points where the maximum distance exceeds the threshold
  filtered_data <- data[distances >= threshold_distance | distances == 0, ] # keep first/last values which are 0
  
  # Transform back to LV95
  filtered_data <- filtered_data %>% st_transform(2056)
  
  return(list(filtered_data = filtered_data, distances = distances)) # distances are just needed for testing thresholds
}
Code
getAngle <- function(coords) {
  angles <- numeric(nrow(coords)) # Initialize angles as a numeric vector
  angles[1] = NA # first point can't have an angle

  for (i in 2:(nrow(coords) - 1)) { # calculate the angle for 3 consecutive points, similar to lag/lead
    angle <- Angle( #function from library LearnGeom
      c(coords[i - 1, "X"], coords[i - 1, "Y"]),
      c(coords[i, "X"], coords[i, "Y"]),
      c(coords[i + 1, "X"], coords[i + 1, "Y"])
    )
    angles[i] <- angle # Assign the calculated angle to the corresponding index in angles
  }
  angles[nrow(coords)] = NA # last point cant have an angle
  return(c(angles))
}
Code
filterStaticByAngle <- function(working_dataset, angleTreshold){
  coords <- data.frame(st_coordinates(working_dataset), working_dataset$id)  
  working_dataset$angle <- getAngle(coords)
  min_angle <- min(working_dataset$angle, na.rm = T)


  while (min_angle <= angleTreshold) { # iteratively filter out tight angles until none smaller 60 are left
    working_dataset <- working_dataset %>% filter(is.na(angle) | angle > angleTreshold) # exclude first and last value (=NA)
    coords <- data.frame(st_coordinates(working_dataset), working_dataset$id)
    working_dataset$angle <- getAngle(coords)
    min_angle <- min(working_dataset$angle, na.rm = T)
  }
  return(working_dataset)
}
Code
# result <- filterStaticByDistance(working_dataset, threshold_distance = 60, consecutive_points = 5)
# filteredByDistance <- result$filtered_data
# working_dataset$distances <- result$distances # just for testing threshold values
# filteredByAngle <- filterStaticByAngle(working_dataset, 60)
# filtertedByDistance_and_Angle <- filterStaticByAngle(filteredByDistance, 60)
Code
# filtered_data_line <- point2line(filtertedByDistance_and_Angle)
# filtered_angle_line <- point2line(filteredByAngle)
# unfiltered_data_line <- point2line(working_dataset)
# 
# pal_blue <- colorRampPalette(c("cyan", "blue"))
# pal_orng <- colorRampPalette(c("yellow", "red"))
# 
# 
# mapview(working_dataset, alpha.regions=1.0, col.regions = "#901010") +
#   mapview(unfiltered_data_line, color="#901010", lwd=2) +
#   mapview(filtertedByDistance_and_Angle, alpha.regions=1.0, col.regions="#008940") +
#   mapview(filtered_data_line, color="#008940", lwd=2) 
# 
# 
# mapview(working_dataset, alpha.regions=1.0, col.regions = "#901010", cex=4) +
#   mapview(unfiltered_data_line, color="#901010", lwd=2) +
#   mapview(filteredByAngle, alpha.regions=1.0, col.regions="orange", cex=4) +
#   mapview(filtered_angle_line, color="orange", lwd=2) 
# 
# mapview(filtered_data_line, color="#008940", lwd=2) +
#   mapview(unfiltered_data_line, color="#901010", lwd=2)

3.2 Movement Parameters

As variables for the machine learning models the following movement parameters have been calculated:

  • Step length

  • Velocity

  • Acceleration

  • Time difference

  • Sinuosity

  • Angle

Code
calc_movement_params <- function(working_dataset){

  # Create Coord Column
  coords <- data.frame(st_coordinates(working_dataset), working_dataset$id)

  # Calculate Time Difference between steps (diff_s), steplenght and velocity. 
  working_dataset <- working_dataset |> 
  mutate(diff_s = as.numeric(difftime(lead(datetime),datetime))) |>
  mutate(steplength = ((coords$X - lead(coords$X))^2 + (coords$Y - lead(coords$Y))^2)^0.5) |>
  mutate(velocity = as.numeric(steplength/diff_s)) |>
  filter(diff_s != 0)

  coords <- data.frame(st_coordinates(working_dataset), working_dataset$id)
  
  #Calculate Moving Window Step length
  working_dataset <- working_dataset |>
  mutate(
    step_mean = rowMeans(
      cbind(
        sqrt((lag(coords$X, 3) - coords$X)^2 + (lag(coords$Y, 3) - coords$Y)^2),
        sqrt((lag(coords$X, 2) - coords$X)^2 + (lag(coords$Y, 2) - coords$Y)^2),
        sqrt((lag(coords$X, 1) - coords$X)^2 + (lag(coords$Y, 1) - coords$Y)^2),
        sqrt((coords$X - lead(coords$X, 1))^2 + (coords$Y - lead(coords$Y, 1))^2),
        sqrt((coords$X - lead(coords$X, 2))^2 + (coords$Y - lead(coords$Y, 2))^2),
        sqrt((coords$X - lead(coords$X, 3))^2 + (coords$Y - lead(coords$Y, 3))^2)
      )
    )
  )
  
  #Calculate Moving Window diff_s
  working_dataset <- working_dataset |>
  mutate(diff_s_mean = as.numeric(difftime(lead(datetime,3),lag(datetime,3)))/6)

  #Calculate Moving Window velocity
  working_dataset <- working_dataset |>
  mutate(velocity_mean = as.numeric(step_mean/diff_s_mean))
 
  # Delete Infitinte Values (is there better solution?)
  #working_dataset$velocity <- working_dataset$velocity[!is.infinite(working_dataset$velocity)]
  coords <- data.frame(st_coordinates(working_dataset), working_dataset$id)

  #Acceleration stepwise
  working_dataset$acceleration <- working_dataset$velocity/lag(working_dataset$diff_s)

  # Calculate acceleration using a moving window
  working_dataset <- working_dataset |>
  mutate(
    acceleration_mean = rowMeans(
      cbind(
        lag(working_dataset$acceleration,3),
        lag(working_dataset$acceleration,2),
        lag(working_dataset$acceleration,1),
        working_dataset$acceleration,
        lead(working_dataset$acceleration,1),
        lead(working_dataset$acceleration,2),
        lead(working_dataset$acceleration,3)
      )
    )
  )
  
  # Calculate Sinuosity using moving step_mean as path length and euclidean distance between
  working_dataset <- working_dataset |>
  mutate(
    sinuosity = 
      ( # Path Length/Direct distance between first and last point
        sqrt((lag(coords$X, 3) - lag(coords$X, 2))^2 + (lag(coords$Y, 3) - lag(coords$Y, 2))^2) +
        sqrt((lag(coords$X, 2) - lag(coords$X, 1))^2 + (lag(coords$Y, 2) - lag(coords$Y, 1))^2) +
        sqrt((lag(coords$X, 1) - coords$X)^2 + (lag(coords$Y, 1) - coords$Y)^2) +
        sqrt((coords$X - lead(coords$X, 1))^2 + (coords$Y - lead(coords$Y, 1))^2) +
        sqrt((lead(coords$X, 1) - lead(coords$X, 2))^2 + (lead(coords$Y, 1) - lead(coords$Y, 2))^2) +
        sqrt((lead(coords$X, 2) - lead(coords$X, 3))^2 + (lead(coords$Y, 2) - lead(coords$Y, 3))^2)
      ) 
      / sqrt((lag(coords$X, 3) - lead(coords$X, 3))^2 + (lag(coords$Y, 3) - lead(coords$Y, 3))^2)
  ) 

  return(working_dataset)
}
Code
run_all <- function(df){
  df <- process_posmo_data(df)
  result <- filterStaticByDistance(df, threshold_distance = 60, consecutive_points = 5)
  result$filtered_data
  filtered_by_angle <- filterStaticByAngle(result$filtered_data, 60)
  df <- calc_movement_params(filtered_by_angle)
  return(df)
}
Code
working_dataset <- run_all(posmo_micha_truth_csv)
processed_1 <- run_all(posmo_cyril_truth_csv)
processed_2 <- run_all(posmo_pool_1)
processed_3 <- run_all(posmo_pool_2)
processed_4 <- run_all(posmo_pool_3)

# Combine data sets
working_dataset <- rbind(working_dataset, processed_1)
posmo_pool <- rbind(processed_2,processed_3,processed_4)

3.3 Contextual Data

Due to similar movement parameters for transportation modes it is a particularly challenging task to automatically classify transportation modes using only movement parameters. Bus and tram in cities for example, exhibit very similar characteristics. To facilitate the classification task, the data was enriched with spatial context data in the form of various networks and land cover, as mentioned in the data description. By incorporating this spatial context data, the classification process can be enhanced by considering the surrounding environment in which the transportation modes operate. For every data point, the closest distance to the difference networks and water bodies was calculated. In some cases, the calculated distances to these networks or water bodies could be extremely large. Including such large values in the data set would lead to a significant span of values, potentially overshadowing smaller differences within cities. To avoid this issue, a decision was made to set a maximum distance of 100m. Any distance beyond 100m was assigned a value of 100m. By setting this threshold, the data set ensures that distances beyond 100m are treated as equal, effectively reducing the influence of extremely large distances on the classification task.

It is important to note that the distance calculation in the data set may not always provide an accurate representation of real distances, especially in cases involving tunnels or underground passages with overlaying data points. An example of this can be seen below, where a tunnel leads close underneath the house of one of the authors and wrong distance proximites are calculated.

Code
# since some of the networks are extremly large data sets, a buffer of all data points were intersected with the networks, and only the network segments that intersected were used to calculate the distance
data_AOI <- st_buffer(working_dataset, 50) %>% st_union()

tram_netz_AOI <- st_intersection(tram_netz, data_AOI)
working_dataset$distance_tram <- as.numeric(st_distance(working_dataset, tram_netz_AOI))
working_dataset$distance_tram <- ifelse(working_dataset$distance_tram > 100, 100, working_dataset$distance_tram)

zug_netz_AOI <- st_intersection(zug_netz, data_AOI)
working_dataset$distance_zug <- as.numeric(st_distance(working_dataset, zug_netz_AOI))
working_dataset$distance_zug <- ifelse(working_dataset$distance_zug > 100, 100, working_dataset$distance_zug)

gewaesser_AOI <- st_intersection(gewaesser, data_AOI)
working_dataset$distance_gewaesser <- as.numeric(st_distance(working_dataset, gewaesser_AOI))
working_dataset$distance_gewaesser <- ifelse(working_dataset$distance_gewaesser > 100, 100, working_dataset$distance_gewaesser)

bus_netz_AOI <- st_intersection(bus_netz, data_AOI)
working_dataset$distance_bus <- as.numeric(st_distance(working_dataset, bus_netz_AOI))
working_dataset$distance_bus <- ifelse(working_dataset$distance_bus > 100, 100, working_dataset$distance_bus)

working_dataset$distance_strasse <- as.numeric(st_distance(working_dataset, strassen))
working_dataset$distance_strasse <- ifelse(working_dataset$distance_strasse > 100, 100, working_dataset$distance_strasse)
Code
data_AOI <- st_buffer(posmo_pool, 50) %>% st_union()

tram_netz_AOI <- st_intersection(tram_netz, data_AOI)
posmo_pool$distance_tram <- as.numeric(st_distance(posmo_pool, tram_netz_AOI))
posmo_pool$distance_tram <- ifelse(posmo_pool$distance_tram > 100, 100, posmo_pool$distance_tram)

zug_netz_AOI <- st_intersection(zug_netz, data_AOI)
posmo_pool$distance_zug <- as.numeric(st_distance(posmo_pool, zug_netz_AOI))
posmo_pool$distance_zug <- ifelse(posmo_pool$distance_zug > 100, 100, posmo_pool$distance_zug)

gewaesser_AOI <- st_intersection(gewaesser, data_AOI)
posmo_pool$distance_gewaesser <- as.numeric(st_distance(posmo_pool, gewaesser_AOI))
posmo_pool$distance_gewaesser <- ifelse(posmo_pool$distance_gewaesser > 100, 100, posmo_pool$distance_gewaesser)

bus_netz_AOI <- st_intersection(bus_netz, data_AOI)
posmo_pool$distance_bus <- as.numeric(st_distance(posmo_pool, bus_netz_AOI))
posmo_pool$distance_bus <- ifelse(posmo_pool$distance_bus > 100, 100, posmo_pool$distance_bus)

posmo_pool$distance_strasse <- as.numeric(st_distance(posmo_pool, strassen))
posmo_pool$distance_strasse <- ifelse(posmo_pool$distance_strasse > 100, 100, posmo_pool$distance_strasse)
Code
# Replace NA values with a specified value (e.g., mean, median, or 0)

working_dataset$sinuosity[is.infinite(working_dataset$sinuosity)] <- NA
working_dataset <- na.omit(working_dataset)

posmo_pool$sinuosity[is.infinite(posmo_pool$sinuosity)] <- NA
posmo_pool <- na.omit(posmo_pool)

Variable Correlation

The correlation matrix shows correlation between variables. Computed velocities and acceleration correlate strongly. Other variables show only little correlation.

Code
# select columns with relevant variable and standardize them
standardized <- working_dataset[, 6:15] %>% 
  st_drop_geometry() %>%
  scale(center = TRUE, scale = TRUE) %>%
  as.data.frame()


corr_matrix <- cor(standardized)
ggcorrplot(corr_matrix)

Code
data_pca <- princomp(corr_matrix)
loadings <- data_pca$loadings
scores <- as.data.frame(data_pca$scores)
Code
# Save full dataset as csv
working_dataset <- st_drop_geometry(working_dataset)
posmo_pool <- st_drop_geometry(posmo_pool)

write.csv(working_dataset, file = "data/full_working_dataset.csv", row.names = F)
write.csv(posmo_pool, file = "data/full_posmo_pool_dataset.csv", row.names = F)

3.4 Class Distribution Overview

Code
working_dataset <- read.delim("data/full_working_dataset.csv",sep=",", header = T) 
posmo_pool <- read.delim("data/full_posmo_pool_dataset.csv",sep=",", header = T) 

working_dataset <- rbind(working_dataset, posmo_pool)

working_dataset <- na.omit(working_dataset)
# Show class distribution
ggplot(working_dataset) + 
  geom_bar(aes(x = transport_mode)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ggtitle("Class Distribution over Unfiltered Data Set") +
  xlab("Transport Mode") + ylab("Count")

The distribution shows that many classes are very poorly represented in the data. Unclassified data is removed and aggregated. The underrepresented transport modes are moved to the class “Other”.

Code
# Remove unwanted classes
working_dataset <- working_dataset[working_dataset$transport_mode != "", ]
working_dataset <- working_dataset[working_dataset$transport_mode != "Other1", ]
# working_dataset <- working_dataset[working_dataset$transport_mode != "Funicular", ]
# working_dataset <- working_dataset[working_dataset$transport_mode != "E_Kick_Scooter", ]
# working_dataset <- working_dataset[working_dataset$transport_mode != "Run", ]
# working_dataset <- working_dataset[working_dataset$transport_mode != "Boat", ]
# working_dataset <- working_dataset[working_dataset$transport_mode != "Skateboard", ]

# Move less relevant modes into category "other"
working_dataset$transport_mode[working_dataset$transport_mode == "Funicular"] <- "Other"
working_dataset$transport_mode[working_dataset$transport_mode == "E_Kick_Scooter"] <- "Other"
working_dataset$transport_mode[working_dataset$transport_mode == "Run"] <- "Other"
working_dataset$transport_mode[working_dataset$transport_mode == "Skateboard"] <- "Other"
working_dataset$transport_mode[working_dataset$transport_mode == "Airplane"] <- "Other"
working_dataset$transport_mode[working_dataset$transport_mode == "E_Bike"] <- "Other"
# working_dataset$transport_mode[working_dataset$transport_mode == "Boat"] <- "Other"

# Show class distribution
classes <- ggplot(working_dataset) + 
  geom_bar(aes(x = transport_mode)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  geom_hline(yintercept = 500, colour = "red", linetype = "dashed") +
  ggtitle("Class Distribution over Redistributed Data Set") +
  xlab("Transport Mode") + ylab("Count")

classes

Code
table(working_dataset$transport_mode)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
 2120   417  4100  7272  3866   468 18577  4732 10283 

The dotted red line lies at a count of 500, representing the desired sample count for the following under sampling of our data set.

3.5 Sampling Interval

The sampling intervals were found to be highly inconsistent. Many large sampling intervals originate from the tracked person being stationary. Therefore the sampling interval is limited to 60 seconds. No re-sampling to equalize the sampling interval is undertaken, to preserve the GPS position and the calculated parameters for each data point, since with large sampling intervals the calculated movement parameters become inaccurate and unrepresentative of the transport mode. After applying the threshold the actual sampling interval of 10, respective 15 seconds can be seen in the box plot.

Code
boxplot_diff_s <- ggplot(working_dataset,aes(x = transport_mode, y = diff_s)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("sample interval [s]") + xlab("Transport Mode") +
  ggtitle("Sample Interval per Class")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$diff_s < 60,]

boxplot_diff_s_after <- ggplot(working_dataset, aes(x = transport_mode, y = diff_s)) + 
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("sample interval [s]") + xlab("Transport Mode") +
  ggtitle("Sample Interval per Class \nAfter Threshold")

# Display the plots side by side
grid.arrange(boxplot_diff_s, boxplot_diff_s_after, nrow = 1)

After the initial removal of sampling intervals larger than 60 seconds we repeat the step for the moving window sampling intervals.

Code
boxplot_diff_s_mean <- ggplot(working_dataset, aes(x = transport_mode, y = diff_s_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),        
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("sample intervall [s]")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$diff_s_mean < 60,]

boxplot_diff_s_mean_after <- ggplot(working_dataset, aes(x = transport_mode, y = diff_s_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("sample intervall [s]")

# Display the plots side by side
grid.arrange(boxplot_diff_s_mean, boxplot_diff_s_mean_after, nrow = 1)

3.6 Parameter Thresholds

3.6.1 Velocity

The velocity attribute shows some outliers for the train class and walking class. The threshold for maximum velocity is set to 55.55 m/s (200km/h ), as no transport mode in our analysis is expected to exceed such velocity. One exception are airplanes, but with only very few data points there is no benefit in including higher velocities. After setting the threshold some obvious outliers remain for the walking class. Reasons for such outliers in the calculated velocity could be:

  • Wrong Classification, even though the data is verified.

  • GPS inaccuracies, where the GPS point location is “jumping” creating very inaccurate, zigzagging tracking data.

Code
boxplot_velocity <- ggplot(working_dataset, aes(x = transport_mode, y = velocity)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("velocity [m/s]") + xlab("Transport Mode") +
  ggtitle("Velocity per Class")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$velocity < 55.55,]

boxplot_velocity_after <- ggplot(working_dataset, aes(x = transport_mode, y = velocity)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("velocity [m/s]") + xlab("Transport Mode") +
  ggtitle("Velocity per Class After Threshold")

# Display the plots side by side
grid.arrange(boxplot_velocity, boxplot_velocity_after, nrow = 1)

3.6.2 Moving Window Velocity

The moving window velocity shows less extreme outliers. The number of outliers can be reduced further by removing setting the trheshold to 55.5m/s (200km/h). After applying the threshold classes with similar average velocities can be identified. This might already be an indicator for classes which are difficult to distinguish using classification methods.

Code
boxplot_velocity_mean <- ggplot(working_dataset, aes(x = transport_mode, y = velocity_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("moving window velocity[m/s]")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$velocity_mean < 55.55,]

boxplot_velocity_mean_after <- ggplot(working_dataset, aes(x = transport_mode, y = velocity_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("moving window velocity [m/s]")

# Display the plots side by side
grid.arrange(boxplot_velocity_mean, boxplot_velocity_mean_after, nrow = 1)

3.6.3 Acceleration

The acceleration threshold is set to 10m/s^2, as for this classification is considered to be the maximum possible acceleration for all classes. The distribution of the classes is similar to the velocities. In the parameter correlation analysis strong correlation between velocity and acceleration was found.

Code
boxplot_acceleration <- ggplot(working_dataset, aes(x = transport_mode, y = acceleration)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("acceleration [m/s^2]")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$acceleration < 10,]

boxplot_acceleration_after <- ggplot(working_dataset, aes(x = transport_mode, y = acceleration)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("acceleration [m/s^2]")

# Display the plots side by side
grid.arrange(boxplot_acceleration, boxplot_acceleration_after, nrow = 1)

3.6.4 Moving Window Acceleration

The acceleration threshold is set to \(10m/s^2\), as for the single point acceleration values.

Code
boxplot_acceleration_mean <- ggplot(working_dataset, aes(x = transport_mode, y = acceleration_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylab("moving window acceleration [m/s^2]")

# Set threshold for parameters
working_dataset <- working_dataset[working_dataset$acceleration_mean < 10,]

boxplot_acceleration_mean_after <- ggplot(working_dataset, aes(x = transport_mode, y = acceleration_mean)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylab("moving window acceleration [m/s^2]")

# Display the plots side by side
grid.arrange(boxplot_acceleration_mean, boxplot_acceleration_mean_after, nrow = 1)

3.7 Under Sampling

The data set is strongly imbalanced. To improve model accuracy we use under sampling to balance the classes. 500 samples per class are desired. The classes “boat” and “other” do not have sufficient points. The sample size is not further decreased, so enough data is provided to train the and test the computed models.

Code
# Create copy for later use
working_dataset_full <- working_dataset

# Set the maximum number of entries per class
max_entries <- 500

# Perform under sampling
working_dataset <- working_dataset |>
  group_by(transport_mode) |>
  sample_n(min(n(), max_entries)) |>
  ungroup()

# Check the resulting undersampled DataFrame
table(working_dataset$transport_mode)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
  500   352   500   500   500   395   500   500   500 
Code
#Drop unwanted/Geom Columns
working_dataset <- working_dataset[,-c(1,3:5)]
working_dataset <- st_drop_geometry(working_dataset)

3.8 Classification

To classify the data a Support Vector Machine (SVM) is applied. A linear SVM, radial SVM and polynomial SVM are tested. We apply a single-train-test split model and a 10 fold cross validation with 3 repeats. The cross validation improves model robustness compared to the single train-test split and reduces bias resulting in a more representative evaluation of the model performance. The tuning sequences are replaced by the best found hyper parameters for each model, to save computation time.

The models are evaluated with the confusion matrix, the overall accuracy, recall, precision, and F1-Score. A confusion matrix is a table that summarizes the performance of a classification model by showing the counts of true positive, true negative, false positive, and false negative predictions. Precision measures the proportion of correctly predicted positive instances out of the total instances predicted as positive. Recall measures the proportion of correctly predicted positive instances out of the total actual positive instances. The F1-score combines precision and recall into a single metric. It provides a balance between precision and recall and is useful when both false positives and false negatives are important.

Code
# Define Control for 10-fold CV
fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           repeats = 3)

We create a training and a test data set. The training data set contains 80% of the data points and the test set contains 20% of the data points.

Code
# Convert to Factor
working_dataset$transport_mode <- as.factor(working_dataset$transport_mode)

# Create Training and Test Data Set
TrainingIndex <- createDataPartition(working_dataset$transport_mode, p = 0.8, list = F)
TrainingSet <- working_dataset[TrainingIndex,]
TestingSet <- working_dataset[-TrainingIndex,]

3.8.1 Liner SVM

A linear support vector machine is tested and the performance evaluated. Different hyper parameter settings were tested to find the best model. For the linear SVM the best fit found is for C = 3 achieving an overall accuracy of 78.1%. Precision, recall and F1-score vary for the classes but average around 78-79%.

Code
# Set seed for reproducibility
set.seed(100)

# Perform Linear SVM
model.svmL <- train(transport_mode ~ ., 
               data = TrainingSet,
               method = "svmLinear",
               na.action = na.omit,
               preprocess = c("scale", "center"),
               trControl = trainControl(method = "none"),
               tuneGrid = data.frame(C = 3),
               )

# Perform Linear SVM with 10-fold Cross Validation (Reduce Length for shorter computation time)
model.svmL.cv <- train(transport_mode ~ ., 
               data = TrainingSet,
               method = "svmLinear",
               na.action = na.omit,
               preprocess = c("sclae","center"),
               trControl = fitControl,
               tuneGrid = expand.grid(C = seq(3, 6, length = 4) # Find best Fit Model
               ))

# Show Best Tune
#print(model.svmL.cv$bestTune)

# Make Predictions
model.svmL.training <- predict(model.svmL, TrainingSet)
model.svmL.testing <- predict(model.svmL, TestingSet)
model.svmL.cv.training <- predict(model.svmL.cv, TrainingSet)
model.svmL.cv.testing <- predict(model.svmL.cv, TrainingSet)

# Model Performance
model.svmL.training.confusion <- confusionMatrix(model.svmL.training, as.factor(TrainingSet$transport_mode))
model.svmL.testing.confusion <- confusionMatrix(model.svmL.testing, as.factor(TestingSet$transport_mode))
model.svmL.cv.training.confusion <- confusionMatrix(model.svmL.cv.training, as.factor(TrainingSet$transport_mode))
(model.svmL.cv.testing.confusion <- confusionMatrix(model.svmL.cv.testing, as.factor(TrainingSet$transport_mode))) # Print test run with CV
Confusion Matrix and Statistics

          Reference
Prediction Bike Boat Bus Car Horse Other Train Tram Walk
     Bike   250    0  64  26     0     6     3   11    1
     Boat     0  282   0   0     0     0     0    0    0
     Bus     35    0 167  11     0     9     2    3   10
     Car      7    0  27 345     1    15     0    0    1
     Horse    7    0   5   8   396    10     0    0  103
     Other   15    0   7   5     2   200     0    1   15
     Train    0    0   3   2     0     0   392    1    0
     Tram    78    0  95   1     0    36     2  361   17
     Walk     8    0  32   2     1    40     1   23  253

Overall Statistics
                                          
               Accuracy : 0.7787          
                 95% CI : (0.7644, 0.7926)
    No Information Rate : 0.1177          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.7505          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Bike Class: Boat Class: Bus Class: Car Class: Horse
Sensitivity              0.62500     1.00000    0.41750     0.8625       0.9900
Specificity              0.96298     1.00000    0.97665     0.9830       0.9556
Pos Pred Value           0.69252     1.00000    0.70464     0.8712       0.7486
Neg Pred Value           0.95061     1.00000    0.92629     0.9817       0.9986
Prevalence               0.11772     0.08299    0.11772     0.1177       0.1177
Detection Rate           0.07357     0.08299    0.04915     0.1015       0.1165
Detection Prevalence     0.10624     0.08299    0.06975     0.1165       0.1557
Balanced Accuracy        0.79399     1.00000    0.69708     0.9227       0.9728
                     Class: Other Class: Train Class: Tram Class: Walk
Sensitivity               0.63291       0.9800      0.9025     0.63250
Specificity               0.98540       0.9980      0.9236     0.96431
Pos Pred Value            0.81633       0.9849      0.6119     0.70278
Neg Pred Value            0.96321       0.9973      0.9861     0.95161
Prevalence                0.09300       0.1177      0.1177     0.11772
Detection Rate            0.05886       0.1154      0.1062     0.07446
Detection Prevalence      0.07210       0.1171      0.1736     0.10594
Balanced Accuracy         0.80916       0.9890      0.9131     0.79840
Code
# Precision for each class
cat("\nPrecision for each class:\n")

Precision for each class:
Code
(precision_svmL <- model.svmL.cv.testing.confusion$byClass[, "Precision"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6925208    1.0000000    0.7046414    0.8712121    0.7485822    0.8163265 
Class: Train  Class: Tram  Class: Walk 
   0.9849246    0.6118644    0.7027778 
Code
cat("\nAverage Precision:\n")

Average Precision:
Code
(precision_svmL <- mean(model.svmL.cv.testing.confusion$byClass[, "Precision"]))
[1] 0.7925389
Code
# Recall for each class
cat("\nRecall for each class:\n")

Recall for each class:
Code
(recall_svmL<- model.svmL.cv.testing.confusion$byClass[, "Recall"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6250000    1.0000000    0.4175000    0.8625000    0.9900000    0.6329114 
Class: Train  Class: Tram  Class: Walk 
   0.9800000    0.9025000    0.6325000 
Code
cat("\nAverage Recall:\n")

Average Recall:
Code
(recall_svmL <- mean(model.svmL.cv.testing.confusion$byClass[, "Recall"]))
[1] 0.7825457
Code
# F1-Score for each class
cat("\nF1-Score for each class:\n")

F1-Score for each class:
Code
(f1_score_svmL <- model.svmL.cv.testing.confusion$byClass[, "F1"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6570302    1.0000000    0.5243328    0.8668342    0.8525296    0.7130125 
Class: Train  Class: Tram  Class: Walk 
   0.9824561    0.7292929    0.6657895 
Code
cat("\nAverage F1-Score:\n")

Average F1-Score:
Code
(f1_score_svmL <- mean(model.svmL.cv.testing.confusion$byClass[, "F1"]))
[1] 0.7768086
Code
# Save the models
saveRDS(model.svmL, "models/model_svmL.rds")
saveRDS(model.svmL.cv, "models/model_svmL_cv.rds")

3.8.2 Radial Support Vector Machine

The radial SVM performs slightly better than the linear SVM with an overall accuracy of 80.92% and similar recall, precision and f1-scores. This model however performs better, since the applied metrics vary less between classes.

Code
# Set seed for reproduceability
set.seed(108)

# Build Training Model
model.svmRadial <- train(transport_mode ~ .,
                         data = TrainingSet,
                         method = "svmRadial",
                         na.action = na.omit,
                         preprocess = c("scale", "center"),
                         trControl = trainControl(method = "none"),
                         tuneGrid = expand.grid(sigma = 0.8683492, C = 5)
)             

# Build CV Model (long processing!!!)
TrainingSet$transport_mode <- as.character(TrainingSet$transport_mode)
model.svmRadial.cv <- train(transport_mode ~ .,
                            data = TrainingSet,
                            method = "svmRadial",
                            na.action = na.omit,
                            preprocess = c("scale", "center"),
                            trControl = fitControl,
                            tuneGrid = expand.grid(sigma = 0.8683492, C = 5)
)
               
(model.svmRadial.cv$bestTune)
      sigma C
1 0.8683492 5
Code
# Make Predictions
model.svmRadial.training <- predict(model.svmRadial, TrainingSet)
model.svmRadial.testing <- predict(model.svmRadial, TestingSet)

# Make Predictions from Cross Validation model
model.svmRadial.cv.training <- predict(model.svmRadial.cv, TrainingSet)
model.svmRadial.cv.testing <- predict(model.svmRadial.cv, TestingSet)

# Model Performance
model.svmRadial.training.confusion <- confusionMatrix(model.svmRadial.training, as.factor(TrainingSet$transport_mode))
model.svmRadial.testing.confusion <- confusionMatrix(model.svmRadial.testing, as.factor(TestingSet$transport_mode))
model.svmRadial.cv.confusion <- confusionMatrix(model.svmRadial.cv.training, as.factor(TrainingSet$transport_mode))
(model.svmRadial.cv.testing.confusion <- confusionMatrix(model.svmRadial.cv.testing, as.factor(TestingSet$transport_mode))) # Print test run with CV
Confusion Matrix and Statistics

          Reference
Prediction Bike Boat Bus Car Horse Other Train Tram Walk
     Bike    74    0  11  10     1     2     0    5    0
     Boat     0   65   0   0     0     0     0    0    0
     Bus     11    0  65   6     1     1     1    7    2
     Car      3    0   5  77     0     1     0    0    2
     Horse    1    0   0   1    93     2     0    0   24
     Other    5    0   2   1     1    60     0    1    4
     Train    3    5   2   4     0     5    99    4    5
     Tram     3    0  10   0     0     3     0   83    6
     Walk     0    0   5   1     4     5     0    0   57

Overall Statistics
                                          
               Accuracy : 0.7927          
                 95% CI : (0.7638, 0.8195)
    No Information Rate : 0.1178          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.7663          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Bike Class: Boat Class: Bus Class: Car Class: Horse
Sensitivity              0.74000     0.92857    0.65000    0.77000       0.9300
Specificity              0.96128     1.00000    0.96128    0.98531       0.9626
Pos Pred Value           0.71845     1.00000    0.69149    0.87500       0.7686
Neg Pred Value           0.96515     0.99362    0.95364    0.96978       0.9904
Prevalence               0.11779     0.08245    0.11779    0.11779       0.1178
Detection Rate           0.08716     0.07656    0.07656    0.09069       0.1095
Detection Prevalence     0.12132     0.07656    0.11072    0.10365       0.1425
Balanced Accuracy        0.85064     0.96429    0.80564    0.87766       0.9463
                     Class: Other Class: Train Class: Tram Class: Walk
Sensitivity               0.75949       0.9900     0.83000     0.57000
Specificity               0.98182       0.9626     0.97063     0.97997
Pos Pred Value            0.81081       0.7795     0.79048     0.79167
Neg Pred Value            0.97548       0.9986     0.97715     0.94466
Prevalence                0.09305       0.1178     0.11779     0.11779
Detection Rate            0.07067       0.1166     0.09776     0.06714
Detection Prevalence      0.08716       0.1496     0.12367     0.08481
Balanced Accuracy         0.87066       0.9763     0.90031     0.77499
Code
# Precision for each class
cat("\nPrecision for each class:\n")

Precision for each class:
Code
(precision_svmRadial <- model.svmRadial.cv.testing.confusion$byClass[, "Precision"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7184466    1.0000000    0.6914894    0.8750000    0.7685950    0.8108108 
Class: Train  Class: Tram  Class: Walk 
   0.7795276    0.7904762    0.7916667 
Code
cat("\nAverage Precision:\n")

Average Precision:
Code
(precision_svmRadial_avg <- mean(model.svmRadial.cv.testing.confusion$byClass[, "Precision"]))
[1] 0.8028902
Code
# Recall for each class
cat("\nRecall for each class:\n")

Recall for each class:
Code
(recall_svmRadial<- model.svmRadial.cv.testing.confusion$byClass[, "Recall"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7400000    0.9285714    0.6500000    0.7700000    0.9300000    0.7594937 
Class: Train  Class: Tram  Class: Walk 
   0.9900000    0.8300000    0.5700000 
Code
cat("\nAverage Recall:\n")

Average Recall:
Code
(recall_svmRadial_avg <- mean(model.svmRadial.cv.testing.confusion$byClass[, "Recall"]))
[1] 0.7964517
Code
# F1-Score for each class
cat("\nF1-Score for each class:\n")

F1-Score for each class:
Code
(f1_score_svmRadial <- model.svmRadial.cv.testing.confusion$byClass[, "F1"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7290640    0.9629630    0.6701031    0.8191489    0.8416290    0.7843137 
Class: Train  Class: Tram  Class: Walk 
   0.8722467    0.8097561    0.6627907 
Code
cat("\nAverage F1-Score:\n")

Average F1-Score:
Code
(f1_score_svmRadial_avg <- mean(model.svmRadial.cv.testing.confusion$byClass[, "F1"]))
[1] 0.7946684
Code
# Save the models
saveRDS(model.svmRadial, "models/model_svmRadial.rds")
saveRDS(model.svmRadial.cv, "models/model_svmRadial_cv.rds")

3.8.3 Polynomial SVM

Out of all tested models the polynomial SVM achieved the highest overall accuracy with 83.86% and the best performance for recall, precision and F1-score. The by class performance is significantly better compared to the other models. The Cohen’s Kappa value lies at 0.81 indicating high agreement between the predictions and ground truth labels. The p-value indicates that the accuracy of the polynomial SVM model is significantly better than the no information rate.

Code
set.seed(100)

# Build Training Model
model.svmPoly <- train(transport_mode ~ ., 
               data = TrainingSet,
               method = "svmPoly",
               na.action = na.omit,
               preprocess = c("sclae","center"),
               trControl = trainControl(method = "none"),
               tuneGrid = data.frame(degree = 3, scale = 0.1, C = 4)
               )
               

# Build CV Model (long processing)
TrainingSet$transport_mode <- as.character(TrainingSet$transport_mode)
model.svmPoly.cv <- train(transport_mode ~ ., 
               data = TrainingSet,
               method = "svmPoly",
               na.action = na.omit,
               preprocess = c("sclae","center"),
               trControl = fitControl,
               tuneGrid = data.frame(degree = 3, scale = 0.1, C = 4) # Fit Model) 
               )
                
               
(model.svmPoly.cv$bestTune)
  degree scale C
1      3   0.1 4
Code
# Make Predictions
model.svmPoly.training <- predict(model.svmPoly, TrainingSet)
model.svmPoly.testing <- predict(model.svmPoly, TestingSet)

# Make Predictions from Cross Validation model
model.svmPoly.cv.training <- predict(model.svmPoly.cv, TrainingSet)
model.svmPoly.cv.testing <- predict(model.svmPoly.cv, TestingSet)

# Model Performance
model.svmPoly.training.confusion <- confusionMatrix(model.svmPoly.training, as.factor(TrainingSet$transport_mode))
model.svmPoly.testing.confusion <- confusionMatrix(model.svmPoly.testing, as.factor(TestingSet$transport_mode))
model.svmPoly.cv.confusion <- confusionMatrix(model.svmPoly.cv.training, as.factor(TrainingSet$transport_mode))
(model.svmPoly.cv.testing.confusion <- confusionMatrix(model.svmPoly.cv.testing, as.factor(TestingSet$transport_mode))) # Print test run with CV
Confusion Matrix and Statistics

          Reference
Prediction Bike Boat Bus Car Horse Other Train Tram Walk
     Bike    79    0  19  13     0     3     0    6    1
     Boat     0   70   0   0     0     0     0    0    0
     Bus      8    0  59   5     0     2     2    1    2
     Car      1    0   6  80     0     1     0    0    2
     Horse    1    0   0   1    99     3     0    0   25
     Other    5    0   2   0     1    60     0    0    5
     Train    0    0   0   0     0     0    98    1    0
     Tram     6    0  10   0     0     3     0   90    5
     Walk     0    0   4   1     0     7     0    2   60

Overall Statistics
                                        
               Accuracy : 0.8186        
                 95% CI : (0.791, 0.844)
    No Information Rate : 0.1178        
    P-Value [Acc > NIR] : < 2.2e-16     
                                        
                  Kappa : 0.7956        
                                        
 Mcnemar's Test P-Value : NA            

Statistics by Class:

                     Class: Bike Class: Boat Class: Bus Class: Car Class: Horse
Sensitivity              0.79000     1.00000    0.59000    0.80000       0.9900
Specificity              0.94393     1.00000    0.97330    0.98665       0.9599
Pos Pred Value           0.65289     1.00000    0.74684    0.88889       0.7674
Neg Pred Value           0.97115     1.00000    0.94675    0.97365       0.9986
Prevalence               0.11779     0.08245    0.11779    0.11779       0.1178
Detection Rate           0.09305     0.08245    0.06949    0.09423       0.1166
Detection Prevalence     0.14252     0.08245    0.09305    0.10601       0.1519
Balanced Accuracy        0.86696     1.00000    0.78165    0.89332       0.9750
                     Class: Other Class: Train Class: Tram Class: Walk
Sensitivity               0.75949       0.9800      0.9000     0.60000
Specificity               0.98312       0.9987      0.9680     0.98131
Pos Pred Value            0.82192       0.9899      0.7895     0.81081
Neg Pred Value            0.97552       0.9973      0.9864     0.94839
Prevalence                0.09305       0.1178      0.1178     0.11779
Detection Rate            0.07067       0.1154      0.1060     0.07067
Detection Prevalence      0.08598       0.1166      0.1343     0.08716
Balanced Accuracy         0.87131       0.9893      0.9340     0.79065
Code
# Precision for each class
cat("\nPrecision for each class:\n")

Precision for each class:
Code
(precision_svmPoly <- model.svmPoly.cv.testing.confusion$byClass[, "Precision"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6528926    1.0000000    0.7468354    0.8888889    0.7674419    0.8219178 
Class: Train  Class: Tram  Class: Walk 
   0.9898990    0.7894737    0.8108108 
Code
cat("\nAverage Precision:\n")

Average Precision:
Code
(precision_svmPoly_avg <- mean(model.svmPoly.cv.testing.confusion$byClass[, "Precision"]))
[1] 0.8297956
Code
# Recall for each class
cat("\nRecall for each class:\n")

Recall for each class:
Code
(recall_svmPoly <- model.svmPoly.cv.testing.confusion$byClass[, "Recall"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7900000    1.0000000    0.5900000    0.8000000    0.9900000    0.7594937 
Class: Train  Class: Tram  Class: Walk 
   0.9800000    0.9000000    0.6000000 
Code
cat("\nAverage Recall:\n")

Average Recall:
Code
(recall_svmPoly_avg <- mean(model.svmPoly.cv.testing.confusion$byClass[, "Recall"]))
[1] 0.8232771
Code
# F1-Score for each class
cat("\nF1-Score for each class:\n")

F1-Score for each class:
Code
(f1_score_svmPoly <- model.svmPoly.cv.testing.confusion$byClass[, "F1"])
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7149321    1.0000000    0.6592179    0.8421053    0.8646288    0.7894737 
Class: Train  Class: Tram  Class: Walk 
   0.9849246    0.8411215    0.6896552 
Code
cat("\nAverage F1-Score:\n")

Average F1-Score:
Code
(f1_score_svmPoly_avg <- mean(model.svmPoly.cv.testing.confusion$byClass[, "F1"]))
[1] 0.8206732
Code
# Save the models
saveRDS(model.svmPoly, "models/model_svmPoly.rds")
saveRDS(model.svmPoly.cv, "models/model_svmPoly_cv.rds")

Since the polynomial SVM showed the best performance, this model is used to predict the transport mode on the full data set, containing 40’529 data points after preprocessing and threshold filtering. The achieved overall accuracy is 82.1% with the 95% confidence interval of [81.73%, 82.48%]. The full data set is very imbalanced, nevertheless the unweighted averaged F1-score lies at 80.7%

Code
# Set seed for reproducibility
set.seed(100)

# Run Model on full data set
model.final <- predict(model.svmPoly.cv, working_dataset_full)

# Create final data frame
working_dataset_result <- data.frame(working_dataset_full, model.final) 

# Confusion Matrix for new results
conf_matrix <- confusionMatrix(as.factor(working_dataset_result$transport_mode), as.factor(working_dataset_result$model.final))
cat("Confusion Matrix:\n")
Confusion Matrix:
Code
conf_matrix
Confusion Matrix and Statistics

          Reference
Prediction  Bike  Boat   Bus   Car Horse Other Train  Tram  Walk
     Bike   1514     3    95    20    33    72     0    87    29
     Boat      0   352     0     0     0     0     0     0     0
     Bus     481     0  2145   212    28   119    17   474   260
     Car     383    10   248  4382   137   115    24    22    30
     Horse     6     0     1    18  3657    22     0     0    13
     Other     5     0     4     5    13   323     0    21    24
     Train   107     0    78    92     0     2 12394    96    12
     Tram    168     0   166    13     0    84    46  3790   231
     Walk    109     3   234    61  2056   409    22   349  4599

Overall Statistics
                                          
               Accuracy : 0.8182          
                 95% CI : (0.8144, 0.8219)
    No Information Rate : 0.3085          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.7797          
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: Bike Class: Boat Class: Bus Class: Car Class: Horse
Sensitivity              0.54598    0.956522    0.72198     0.9123      0.61732
Specificity              0.99102    1.000000    0.95763     0.9729      0.99827
Pos Pred Value           0.81705    1.000000    0.57414     0.8189      0.98386
Neg Pred Value           0.96744    0.999602    0.97755     0.9880      0.93841
Prevalence               0.06843    0.009081    0.07331     0.1185      0.14618
Detection Rate           0.03736    0.008686    0.05293     0.1081      0.09024
Detection Prevalence     0.04572    0.008686    0.09219     0.1320      0.09172
Balanced Accuracy        0.76850    0.978261    0.83981     0.9426      0.80779
                     Class: Other Class: Train Class: Tram Class: Walk
Sensitivity              0.281850       0.9913     0.78322      0.8848
Specificity              0.998172       0.9862     0.98016      0.9082
Pos Pred Value           0.817722       0.9697     0.84260      0.5865
Neg Pred Value           0.979492       0.9961     0.97088      0.9817
Prevalence               0.028279       0.3085     0.11941      0.1283
Detection Rate           0.007970       0.3058     0.09352      0.1135
Detection Prevalence     0.009747       0.3154     0.11099      0.1935
Balanced Accuracy        0.640011       0.9887     0.88169      0.8965
Code
# Precision for each class
precision <- conf_matrix$byClass[, "Precision"]
cat("\nPrecision for each class:\n")

Precision for each class:
Code
precision
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.8170534    1.0000000    0.5741435    0.8189124    0.9838579    0.8177215 
Class: Train  Class: Tram  Class: Walk 
   0.9697207    0.8425967    0.5864575 
Code
# Average Precision
avg_precision <- mean(conf_matrix$byClass[, "Precision"])
cat("\nAverage Precision:\n")

Average Precision:
Code
avg_precision
[1] 0.8233848
Code
# Recall for each class
recall <- conf_matrix$byClass[, "Recall"]
cat("\nRecall for each class:\n")

Recall for each class:
Code
recall
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.5459791    0.9565217    0.7219791    0.9123465    0.6173194    0.2818499 
Class: Train  Class: Tram  Class: Walk 
   0.9912821    0.7832197    0.8847634 
Code
# Average Recall
avg_recall <- mean(conf_matrix$byClass[, "Recall"])
cat("\nAverage Recall:\n")

Average Recall:
Code
avg_recall
[1] 0.7439179
Code
# F1-Score for each class
f1_score <- conf_matrix$byClass[, "F1"]
cat("\nF1-Score for each class:\n")

F1-Score for each class:
Code
f1_score
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6545612    0.9777778    0.6396302    0.8631081    0.7586350    0.4192083 
Class: Train  Class: Tram  Class: Walk 
   0.9803829    0.8118239    0.7053681 
Code
# Average F1-Score
avg_f1_score <- mean(conf_matrix$byClass[, "F1"])
cat("\nAverage F1-Score:\n")

Average F1-Score:
Code
avg_f1_score
[1] 0.7567217
Code
# Save working_dataset_result as a CSV file
write.csv(working_dataset_result, "data/working_dataset_result.csv", row.names = FALSE)

The resulting class distribution shows that the model predicts too many points as train. This boosts the models performance, since the train class is strongly over represented in this data set. Between the transport modes Car, Bus, Bike and Tram we expected many false classifications, since key parameters such as velocity and acceleration lie in similar ranges and are difficult to distinguish by the model.

Code
# Show class distribution
final_classes <- ggplot(working_dataset_full) + 
  geom_bar(aes(x = model.final)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  geom_hline(yintercept = 500, colour = "red", linetype = "dashed") +
  ylim(c(0,14000)) + xlab("Transport Mode")

classes <- ggplot(working_dataset_result) + 
  geom_bar(aes(x = transport_mode)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  geom_hline(yintercept = 500, colour = "red", linetype = "dashed") +
  ylim(c(0,14000)) + xlab("Transport Mode")

grid.arrange(classes, final_classes, nrow = 1)

Code
table(working_dataset_result$transport_mode)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
 1853   352  3736  5351  3717   395 12781  4498  7842 
Code
table(working_dataset_result$model.final)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
 2773   368  2971  4803  5924  1146 12503  4839  5198 

3.9 Post Processing

To boost the model performance some simple post processing is applied. A moving window function is used to find misclassified points within segments. This function searches within x neighbors of a point and if a given percentage of these points belong to one class the point is reclassified as the majority of its neighboring points. This process can be applied iteratively.

For this data set a window size of 1, a threshold percentage of 75% and 3 iterations results in a smoothing of the results, but not necessarily a gain in model accuracy.

Code
# Run a loop to identify outlier points in classification. If prevous and following x points are identical, 
# but the middle one is different it is changed


# Define the number of previous and following points to consider
# x: Number of points to be looked at surrounding current value in each direction (x*2 neighbours considered)
# threshold_percentage: number of points which have to be equal so the current value gets changed
# i: number of iterations

single_point_correction <- function(df, x, threshold_percentage, iterations) {
  
  # Track the number of points changed
  changed_count <- 0  
  
  for (iter in 1:iterations) {
    for (i in (x + 1):(nrow(df) - x)) {
      current_value <- df$model.final[i]
      
      # Find x-Previous & x-Following Values around point i
      previous_values <- df$model.final[(i - x):(i - 1)]
      following_values <- df$model.final[(i + 1):(i + x)]
      
      # Calculate the number of occurrences for each class in the surrounding points
      class_counts <- table(c(previous_values, following_values))
      
      # Find the class that occurs most frequently
      most_frequent_class <- names(class_counts)[which.max(class_counts)]
      
      # Check if the most frequent class exceeds the threshold count
      if (class_counts[most_frequent_class] > threshold_percentage * length(c(previous_values, following_values))) {
        df$model.final[i] <- most_frequent_class
        changed_count <- changed_count + 1
      }
    }
  message("Metrics after each iteration:")
  conf_matrix_func <- confusionMatrix(as.factor(df$transport_mode), as.factor(df$model.final))
  # Precision for each class
  cat("\n Mean Precision\n")
  print(precision_func <- mean(conf_matrix_func$byClass[, "Precision"]))
  # Recall for each class
  cat("\n Mean Recall\n")
  print(recall_func <- mean(conf_matrix_func$byClass[, "Recall"]))
  # F1-Score for each class
  cat("\n Mean F1-Score\n")
  print(f1_score_func <- mean(conf_matrix_func$byClass[, "F1"]))
  
  }
  

  message("Number of times the condition is true and values are updated:", changed_count)
  

  return(df)
}

working_dataset_result_copy <- working_dataset_result
working_dataset_result <- single_point_correction(working_dataset_result, 10, 0.75, 3)

 Mean Precision
[1] 0.8541377

 Mean Recall
[1] 0.7786285

 Mean F1-Score
[1] 0.791695

 Mean Precision
[1] 0.8548945

 Mean Recall
[1] 0.7794413

 Mean F1-Score
[1] 0.792599

 Mean Precision
[1] 0.8552145

 Mean Recall
[1] 0.7796972

 Mean F1-Score
[1] 0.7928828
Code
# Confusion Matrix for new results
conf_matrix_2 <- confusionMatrix(as.factor(working_dataset_result$transport_mode), as.factor(working_dataset_result$model.final))
cat("Confusion Matrix:\n")
Confusion Matrix:
Code
conf_matrix_2
Confusion Matrix and Statistics

          Reference
Prediction  Bike  Boat   Bus   Car Horse Other Train  Tram  Walk
     Bike   1663     0    32    10    34    65     0    36    13
     Boat      0   352     0     0     0     0     0     0     0
     Bus     460     0  2276   212    28   102    15   392   251
     Car     357     2   191  4528   128    83    14    18    30
     Horse     2     0     0     3  3704     6     0     0     2
     Other     4     0     2     5    13   343     0    14    14
     Train    85     0    38    59     0     2 12563    24    10
     Tram     75     0   122     6     0    47    12  4012   224
     Walk     92     2   214    49  2081   306    20   339  4739

Overall Statistics
                                         
               Accuracy : 0.8434         
                 95% CI : (0.8399, 0.847)
    No Information Rate : 0.3115         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.81           
                                         
 Mcnemar's Test P-Value : NA             

Statistics by Class:

                     Class: Bike Class: Boat Class: Bus Class: Car Class: Horse
Sensitivity              0.60738    0.988764    0.79165     0.9294      0.61857
Specificity              0.99497    1.000000    0.96122     0.9769      0.99962
Pos Pred Value           0.89746    1.000000    0.60921     0.8462      0.99650
Neg Pred Value           0.97220    0.999900    0.98372     0.9902      0.93795
Prevalence               0.06756    0.008785    0.07094     0.1202      0.14776
Detection Rate           0.04104    0.008686    0.05616     0.1117      0.09140
Detection Prevalence     0.04572    0.008686    0.09219     0.1320      0.09172
Balanced Accuracy        0.80117    0.994382    0.87644     0.9532      0.80910
                     Class: Other Class: Train Class: Tram Class: Walk
Sensitivity              0.359539       0.9952      0.8298      0.8970
Specificity              0.998686       0.9922      0.9864      0.9120
Pos Pred Value           0.868354       0.9829      0.8920      0.6043
Neg Pred Value           0.984774       0.9978      0.9772      0.9834
Prevalence               0.023541       0.3115      0.1193      0.1304
Detection Rate           0.008464       0.3100      0.0990      0.1169
Detection Prevalence     0.009747       0.3154      0.1110      0.1935
Balanced Accuracy        0.679112       0.9937      0.9081      0.9045
Code
# Precision for each class
precision_2 <- conf_matrix_2$byClass[, "Precision"]
cat("\nPrecision for each class:\n")

Precision for each class:
Code
precision_2
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.8974636    1.0000000    0.6092077    0.8461970    0.9965026    0.8683544 
Class: Train  Class: Tram  Class: Walk 
   0.9829434    0.8919520    0.6043101 
Code
# Average Precision
avg_precision_2 <- mean(conf_matrix_2$byClass[, "Precision"])
cat("\nAverage Precision:\n")

Average Precision:
Code
avg_precision_2
[1] 0.8552145
Code
# Recall for each class
recall_2 <- conf_matrix_2$byClass[, "Recall"]
cat("\nRecall for each class:\n")

Recall for each class:
Code
recall_2
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.6073776    0.9887640    0.7916522    0.9293924    0.6185705    0.3595388 
Class: Train  Class: Tram  Class: Walk 
   0.9951679    0.8297828    0.8970282 
Code
# Average Recall
avg_recall_2 <- mean(conf_matrix_2$byClass[, "Recall"])
cat("\nAverage Recall:\n")

Average Recall:
Code
avg_recall_2
[1] 0.7796972
Code
# F1-Score for each class
f1_score_2 <- conf_matrix_2$byClass[, "F1"]
cat("\nF1-Score for each class:\n")

F1-Score for each class:
Code
f1_score_2
 Class: Bike  Class: Boat   Class: Bus   Class: Car Class: Horse Class: Other 
   0.7244609    0.9943503    0.6885494    0.8858456    0.7633179    0.5085248 
Class: Train  Class: Tram  Class: Walk 
   0.9890179    0.8597450    0.7221333 
Code
# Average F1-Score
avg_f1_score <- mean(conf_matrix_2$byClass[, "F1"])
cat("\nAverage F1-Score:\n")

Average F1-Score:
Code
avg_f1_score
[1] 0.7928828

After the preprocessing there was not significant difference in the class distribution.

Code
# Show class distribution
final_classes <- ggplot(working_dataset_result) + 
  geom_bar(aes(x = model.final)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylim(c(0,14000)) +
  ggtitle("Ground Truth Class Distribution") +
  xlab("Transport Mode")

classes <- ggplot(working_dataset_result) + 
  geom_bar(aes(x = transport_mode)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),         
        panel.background = element_rect(fill = "transparent", color = NA)) +
  ylim(c(0,14000)) +
  ggtitle("Class Distribution After \nPost Processing") +
  xlab("Transport Mode")


grid.arrange(final_classes,classes, nrow = 1)

Code
cat("Ground Truth\n")
Ground Truth
Code
table(working_dataset_result$transport_mode)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
 1853   352  3736  5351  3717   395 12781  4498  7842 
Code
cat("\nClassification \n")

Classification 
Code
table(working_dataset_result$model.final)

 Bike  Boat   Bus   Car Horse Other Train  Tram  Walk 
 2738   356  2875  4872  5988   954 12624  4835  5283 

The below map allows the comparison between wrongly classified points and ground truth to explore where the model fails.

Code
# Filter Wrong Points
wrong_points <- working_dataset_result |>
  filter(model.final != transport_mode) |>
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326)

wrong_points <- wrong_points[,c(2,19,20)] 

# Create Interactive Map to compare false classification
map <- mapview(wrong_points, alpha.regions = 1.0, zcol = "transport_mode", layer.name = "Transport Mode") +
  mapview(wrong_points, alpha.regions = 1.0, zcol = "model.final", legend = FALSE, layer.name = "Model Classification")

map

4. Findings

The preprocessing of GPS data plays a crucial role in influencing the classification results. While individual computed parameters such as velocity, acceleration, and sinuosity provide valuable information, they are insufficient to construct a robust model on their own. However, applying moving window functions to these parameters can greatly enhance the accuracy of the model,67 .

To effectively differentiate between similar classes like buses, trams, cars, bikes, and boats, additional parameters need to be considered. For instance, incorporating the distance to public traffic networks specific to each transport mode can significantly improve the accuracy of the model. These additional parameters provide valuable contextual information that aids in distinguishing between similar classes.

In urban settings, distinguishing between bus, tram, and car travel poses a particular challenge due to the characteristic stop-and-go movement patterns. The frequent fluctuations between low velocities and accelerations make it difficult to discern the specific class. These movement patterns can correspond to multiple classes and create ambiguity in the classification process. By addressing these challenges model accuracy can be enhanced.

5. Discussion

In order to enhance the overall classification accuracy, it is crucial to adopt a more strategic approach to test various parameters and their impact on the classification results. This includes exploring different preprocessing techniques, employing diverse models, and implementing appropriate post-processing steps. Specifically moving window size, which imparts a smoothing effect on computed parameters, and the hyper parameters of the SVM models could benefit from further refinement with increased computational power.

In related studies on transport mode detection, segmentation has been successfully applied to the data,8.9 In this context, point data was utilized to investigate whether the classification model could autonomously identify distinct segments. Preliminary results suggest that the model often identifies segments, but further analysis is necessary to validate these findings. Furthermore, Biljecki et al.10 proposed categorizing different transport modes into land, water, and air travel and classify each individually. This approach was not implemented, but by incorporating distance-to-water calculations to identify instances of boat travel, it is possible to identify boat travel within the same model as land travel.

To improve the data quality of GPS data, there are several potential avenues to explore. One approach is to employ a quicker sampling interval, allowing for more frequent data points to be captured. Additionally, supplementing GPS data with accelerator data, as demonstrated by Roy et al.,11 has been shown to enhance model performance, leading to an accuracy improvement of approximately 90%.

Ultimately, the quality of GPS data is the key-weakness of our model, despite the data quality it is shown that it is possible to classify transport modes with an approximate 85% accuracy.

References

Code
wordcountaddin::text_stats("index.qmd")
Method koRpus stringi
Word count 2955 2886
Character count 19190 19233
Sentence count 197 Not available
Reading time 14.8 minutes 14.4 minutes

References

Biljecki, Filip, Hugo Ledoux, and Peter van Oosterom. “Transportation Mode-Based Segmentation and Classification of Movement Trajectories.” International Journal of Geographical Information Science 27 (February 2013): 385–407. doi:10.1080/13658816.2012.692791.
Geoinformation Kt. Bern, Amt für. “Öffentlicher Verkehr,” 2023.
Raumentwicklung Kt. Zürich, Amt für. “Linien Des Öffentlichen Verkehrs,” 2022.
Roy, Avipsa, Daniel Fuller, Kevin Stanley, and Trisalyn Nelson. “Classifying Transport Mode from Global Positioning Systems and Accelerometer Data: A Machine Learning Approach.” Findings, September 2020. doi:10.32866/001c.14520.
Topography swisstopo, Federal Office of. “swissTLM3D,” 2023.
Transport FOT, Federal Bureau of. “Öffentlicher Verkehr,” 2023.

Footnotes

  1. Federal Office of Topography swisstopo, “swissTLM3D,” 2023.↩︎

  2. Federal Bureau of Transport FOT, “Öffentlicher Verkehr,” 2023.↩︎

  3. Amt für Raumentwicklung Kt. Zürich, “Linien Des Öffentlichen Verkehrs,” 2022.↩︎

  4. Amt für Geoinformation Kt. Bern, “Öffentlicher Verkehr,” 2023.↩︎

  5. Topography swisstopo, “swissTLM3D”.↩︎

  6. Filip Biljecki, Hugo Ledoux, and Peter van Oosterom, “Transportation Mode-Based Segmentation and Classification of Movement Trajectories,” International Journal of Geographical Information Science 27 (February 2013): 385–407, doi:10.1080/13658816.2012.692791.↩︎

  7. Avipsa Roy et al., “Classifying Transport Mode from Global Positioning Systems and Accelerometer Data: A Machine Learning Approach,” Findings, September 2020, doi:10.32866/001c.14520.↩︎

  8. Biljecki, Ledoux, and Oosterom, “Transportation Mode-Based Segmentation and Classification of Movement Trajectories”.↩︎

  9. Roy et al., “Classifying Transport Mode from Global Positioning Systems and Accelerometer Data”.↩︎

  10. “Transportation Mode-Based Segmentation and Classification of Movement Trajectories”.↩︎

  11. “Classifying Transport Mode from Global Positioning Systems and Accelerometer Data”.↩︎